feat: metrics reporting for scan and commit by evindj · Pull Request #589 · apache/iceberg-cpp

evindj · 2026-03-11T20:28:15Z

Initial commit for addressing #533

wgtmac · 2026-03-20T12:45:44Z

Let me know when ready for review :)

evindj · 2026-03-21T06:46:44Z

Let me know when ready for review :)

for sure ! most likely tomorrow

wgtmac · 2026-03-27T08:10:39Z

Thanks for updating this! I still need to take some time to get familiar with the Java implementation and its API in the rest spec before reviewing it.

wgtmac

Thanks for working on this! I have some general questions on the design, especially on the capability of customization. IMO we can focus on the API design for now and later integrate with other classes. Please let me know what you think.

wgtmac · 2026-03-29T06:56:10Z

+struct ICEBERG_EXPORT ScanReport {
+  /// \brief The fully qualified name of the table that was scanned.
+  std::string table_name;
+


Why some metrics have blank lines in between but others don't? I think we can remove all these blank lines to be compact.

wgtmac · 2026-03-29T06:56:38Z

+  int64_t snapshot_id = -1;
+
+  /// \brief Filter expression used in the scan, if any.
+  std::string filter;


Should this be std::shared_ptr<Expression>?

wgtmac · 2026-03-31T09:37:23Z

+namespace iceberg {
+
+/// \brief Duration type for metrics reporting in milliseconds.
+using DurationMs = std::chrono::milliseconds;


This hard-codes our reported duration unit to be milliseconds, which violates the spec I think?

wgtmac · 2026-03-31T09:38:36Z

+  std::string filter;
+
+  /// \brief Schema ID.
+  int32_t schema_id = -1;


We are missing some fields like projectedFieldIds and projectedFieldNames from the Java implementation. I think they are required by the REST spec: https://github.com/apache/iceberg/blob/149cc464f9b7df800cc5718af725983473819504/open-api/rest-catalog-open-api.yaml#L3990-L4023

wgtmac · 2026-03-31T09:40:29Z

+  std::string table_name;
+
+  /// \brief The snapshot ID created by this commit.
+  int64_t snapshot_id = -1;


Please use kInvalidSnapshotId defined in the iceberg/constant.h

wgtmac · 2026-03-31T09:41:18Z

+  /// \brief Total number of data manifests.
+  int64_t total_data_manifests = 0;
+
+  /// \brief Number of data manifests that were skipped.


The comment is inconsistent.

wgtmac · 2026-03-31T09:41:28Z

+  int64_t skipped_data_files = 0;
+
+  /// \brief Number of data manifests that were skipped.
+  int64_t skipped_delete_files = 0;


wgtmac · 2026-03-31T09:43:17Z

+  int32_t schema_id = -1;
+
+  /// \brief Total duration of the entire scan operation.
+  DurationMs total_duration{0};


Should we remove this as this is not defined by the Java implementation?

wgtmac · 2026-03-31T10:00:01Z

+///
+/// This variant type allows handling both report types uniformly through
+/// the MetricsReporter interface.
+using MetricsReport = std::variant<ScanReport, CommitReport>;


If we define MetricsReport as a std::variant, we cannot support customizing metrics report. For example, engines may have more metrics to report than defined by the Java implementation. Even the REST spec does not explicitly define what keys are required.

Instead, should we define the MetricsReport like below?

struct MetricsReport { std::string kind; // can be "scan" or "commit", or whatever customized std::unordered_map<std::string, CounterResult> counter_results; std::unordered_map<std::string, TimerResult> timer_results; };

What do you think?

After thinking more on this, I'm fine to use your current approach to define ScanReport and CommitReport with explicit fields. MetricsReports are collected by this library so users do not have the flexibility to customize them. We only need to customize MetricsReporter.

I like the bag of keys approach because it provides for customization without necessarily having to require compilation. let me think a bit more about the entire thing, I might reach out on slack for a quick sync.

wgtmac · 2026-03-31T10:01:44Z

+  ///
+  /// \param reporter_type Case-insensitive type identifier (e.g., "noop").
+  /// \param factory Factory function that produces the reporter.
+  static void Register(std::string_view reporter_type, MetricsReporterFactory factory);


How do we support the Java parity CompositeMetricsReporter? It would be useful if we want to report metrics to multiple sinks.

evindj · 2026-03-31T12:42:16Z

Thanks for working on this! I have some general questions on the design, especially on the capability of customization. IMO we can focus on the API design for now and later integrate with other classes. Please let me know what you think.

This is fair, I just wanted to show the end to end picture for ease of understanding.

wgtmac · 2026-04-17T09:59:51Z

+# specific language governing permissions and limitations
+# under the License.
+
+iceberg_install_all_headers(iceberg/metrics)


Missing meson.build equivalent for this subdirectory

wgtmac

Follow-up findings from a deeper design/parity pass over the metrics reporting changes.

wgtmac · 2026-04-17T15:15:42Z

  committed_ = true;
  ctx_->table = std::move(commit_result.value());
-
+  ReportCommitMetrics();


Transaction::Commit() always calls ReportCommitMetrics() after a successful metadata commit, even for metadata-only transactions. That helper then reads current_snapshot() from the post-commit table and reports it as if this transaction created that snapshot, which can re-emit stale snapshot metrics and a misleading operation name. Java only reports commit metrics from snapshot-producing commits (CreateSnapshotEvent in SnapshotProducer), so the reporting hook should move closer to snapshot-producing updates or otherwise prove that this transaction actually created a new snapshot before emitting a CommitReport.

wgtmac · 2026-04-17T15:15:42Z


+  // Load metrics reporter from catalog properties
+  std::shared_ptr<MetricsReporter> reporter;
+  auto reporter_result = MetricsReporters::Load(final_config.configs());


The REST catalog now loads a local reporter from catalog properties, but it never constructs the built-in REST reporter for the /metrics endpoint and never combines the two. As a result, scan/commit reports from tables loaded through RestCatalog are never POSTed back to the server even when the server advertises ReportMetrics support. Java's RESTSessionCatalog explicitly combines the catalog reporter with RESTMetricsReporter, so this leaves the feature only partially implemented on the C++ side.

wgtmac · 2026-04-17T15:15:42Z

+/// \brief Infer the reporter type from properties.
+std::string InferReporterType(
+    const std::unordered_map<std::string, std::string>& properties) {
+  auto it = properties.find(std::string(kMetricsReporterImpl));


This loader treats metrics-reporter-impl as a lowercase registry key and defaults to noop, while Java and the Iceberg docs treat the same property as a fully qualified class name whose instance is initialized with catalog properties. On top of that, the catalog constructors ignore Load() errors and Table falls back to noop, so a Java-style config string or typo quietly disables metrics instead of surfacing a configuration error. That is a public-contract drift, not just an implementation detail.

while Java and the Iceberg docs treat the same property as a fully qualified class name whose instance is initialized with catalog properties

the divergence is because c++ does not support reflection. I decided on making things local case to reduce the risk of typos based on caps on/off.

typo quietly disables metrics instead of surfacing a configuration error
I will update to surface a configuration error.

wgtmac · 2026-04-17T15:15:42Z

+///
+/// Embedded in ScanReport and populated by DataTableScan after PlanFiles()
+/// completes. Mirrors the fields in Java's ScanMetricsResult.
+struct ICEBERG_EXPORT ScanMetrics {


ScanReport embeds a plain ScanMetrics struct of raw integers and std::chrono::nanoseconds, but Java's contract is layered: live counters/timers are collected in ScanMetrics, converted into ScanMetricsResult, and then wrapped by ScanReport. That layering preserves units, optional presence, noop semantics, and serialization behavior. In C++, every metric defaults to 0, so consumers cannot distinguish "not collected" from a real zero value, and the report shape is no longer compatible with Java's richer contract.

wgtmac · 2026-04-17T15:15:42Z

+  metric.total_equality_deletes = parse_int64(SnapshotSummaryFields::kTotalEqDeletes);
+  metric.added_dvs = parse_int64(SnapshotSummaryFields::kAddedDVs);
+  metric.removed_dvs = parse_int64(SnapshotSummaryFields::kRemovedDVs);
+  metric.created_manifest_count = parse_int64(SnapshotSummaryFields::kManifestsCreated);


CommitMetrics exposes replaced_manifest_count, but ReportCommitMetrics() never reads SnapshotSummaryFields::kManifestsReplaced into it. Java's CommitMetricsResult.from(...) does populate the corresponding manifestsReplaced field from the snapshot summary. Any consumer relying on this report will currently observe 0 even when the commit actually replaced manifests.

wgtmac · 2026-04-17T15:15:42Z

+    for (const auto& task : tasks) {
+      report.scan_metrics.total_file_size_in_bytes +=
+          task->data_file()->file_size_in_bytes;
+      for (const auto& del_file : task->delete_files()) {


The scan report counts equality and positional delete files, but it never increments indexed_delete_files and it never separates deletion vectors from ordinary positional deletes. Java's ScanMetricsUtil.indexedDeleteFile(...) increments all three counters (indexedDeleteFiles, positionalDeleteFiles/dvs, equalityDeleteFiles). Because the C++ report schema already exposes indexed_delete_files and dvs, leaving them unset produces parity drift and misleading metrics for DV-heavy tables.

wgtmac · 2026-04-17T15:15:42Z

+
+namespace iceberg {
+
+/// \brief Metrics collected during a table commit (snapshot creation).


CommitReport only carries summary-derived counters, but Java's commit reporting model also has live commit metrics (totalDuration, attempts) collected through CommitMetrics and serialized through CommitMetricsResult. Because those fields are absent from the C++ public model entirely, even a future implementation cannot fully mirror Java's commit report without another API change.

wgtmac · 2026-04-17T15:15:42Z

  /// \param io FileIO instance for reading manifests files.
+  /// \param reporter Optional metrics reporter for scan metrics.
+  /// \param table_name Optional table name for metrics reporting.
  static Result<std::unique_ptr<TableScanBuilder<ScanType>>> Make(


The builder stores a reporter internally, but there is no public scan-level API to add an extra reporter the way Java's Scan.metricsReporter(...) does. That means C++ can only use the reporter injected from the table/catalog path, while Java supports composing an additional reporter for a particular scan. For observability integrations and tests, that is a meaningful flexibility gap.

good catch will address.

wgtmac · 2026-04-17T15:34:37Z

I have a design question about the ScanReport / CommitReport model in this PR. What is the intended direction here with respect to the Java metrics API? In Java, the reporting model is layered around metrics collection/result types, which also carry unit information and are backed by reusable measurement utilities (for example counters/timers and the related conversion helpers). In this C++ version, the reports currently look more like raw result structs, and I do not yet see an equivalent measurement abstraction/tooling layer behind them. Is that divergence intentional for C++, or is this PR meant as an initial step toward a closer alignment with the Java design over time?

evindj marked this pull request as draft March 11, 2026 20:33

evindj force-pushed the metrics_core_components branch from 5fa02cb to 0a85180 Compare March 11, 2026 20:48

evindj force-pushed the metrics_core_components branch from eaacbe8 to 5241044 Compare March 25, 2026 02:56

Innocent added 5 commits March 24, 2026 19:59

feat: metrics reporting for scan and commit

fe23b17

configured meson build

7df1921

feat: metrics implementation

72608cb

build fix

a9ab812

clang format

d26fb96

evindj force-pushed the metrics_core_components branch from 5241044 to d26fb96 Compare March 25, 2026 03:15

evindj marked this pull request as ready for review March 25, 2026 04:08

wgtmac reviewed Mar 31, 2026

View reviewed changes

evindj marked this pull request as draft March 31, 2026 13:59

Innocent added 5 commits April 1, 2026 14:31

add ability to combine reporters

0d960ee

clang

acaa576

linter

ff4852c

build fix

f0d51e3

build fix

d729831

evindj marked this pull request as ready for review April 2, 2026 05:08

wgtmac reviewed Apr 17, 2026

View reviewed changes


		namespace iceberg {

		/// \brief Metrics collected during a table commit (snapshot creation).

Conversation

evindj commented Mar 11, 2026

Uh oh!

wgtmac commented Mar 20, 2026

Uh oh!

evindj commented Mar 21, 2026

Uh oh!

wgtmac commented Mar 27, 2026

Uh oh!

wgtmac left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evindj Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evindj commented Mar 31, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wgtmac left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wgtmac commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

evindj Mar 31, 2026 •

edited

Loading